Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #679

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

kevin-zhonghao
Copy link
Contributor

@kevin-zhonghao kevin-zhonghao commented Aug 15, 2021

  1. Configure cluster and network configuration to meet the requirements on using Metrics Server:

    • Metrics Server must be reachable from kube-apiserver by container IP address (or node IP if hostNetwork is enabled).
    • The kube-apiserver must enable an aggregation layer.
    • Nodes must have Webhook authentication and authorization enabled.
    • Kubelet certificate needs to be signed by cluster Certificate Authority (or disable certificate validation by passing --kubelet-insecure-tls to Metrics Server)
    • Container runtime must implement a container metrics RPCs (or have cAdvisor support)
  2. Deploy Metrics-Server in cluster.

  3. Deploy Horizontal Pod Autoscaler for each deployment with customized hpa yaml.

Comment on lines +19 to +24
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-zhonghao Kevin, where are those metrics stored? Is it possible to visualize those metrics somethere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xieus Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler. It is not used to store any data, it is more likely an API to get current resource usage status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cj-chung A little similar. In general, we don't use metrics-server as monitoring solution or as a source of monitoring solution metrics. Currently it is just used by HPA.

Copy link
Contributor

@xieus xieus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-zhonghao Thanks. A few comments.

policies:
- type: Percent
value: 100
periodSeconds: 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does periodSeconds mean the time interval to check the percentage number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does periodSeconds mean the time interval to check the percentage number?

Not really, periodSeconds: 15 above means it can reduce pods by up to 100% in 15 seconds.

# The autoscaler will choose the strategy that affects the minimum number of Pods
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of stabilizationWindowSeconds differs in the ScaleUp and ScaleDown policies. Is it the best practice to set stabilizationWindowSeconds = 0? Does 0 meaning that the autoscaler will always respond to changes immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating.
When the metrics indicate that the target should be scaled down the algorithm looks into previously computed desired states and uses the highest value from the specified interval.

For example,
here we set up
'scaleup:
stabilizationWindowSeconds: 0'

It should scale up the pods immediately if need.

and we set up
'scaledown:
stabilizationWindowSeconds: 300'

When current metrics indicate that we could scale down the pods, HPA will consider the state of past within 300 seconds to determine if we can scale down now.

@xieus
Copy link
Contributor

xieus commented Oct 3, 2021

Some preliminary test in the Medina cluster appears to be quite promising.

The current 0.19 release already contains a lot of new features therefore we decided to move this feature to 12/30 release.

@xieus
Copy link
Contributor

xieus commented Feb 1, 2022

@yanmo96 We need to test this PR and get it merged by 2/28.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment
3 participants